Automatic Speech Recognition for Indoor HRI Scenarios
نویسندگان
چکیده
This article presents a stand-alone automatic speech recognition system that accounts for listener movement, time-varying reverberation effects, environmental noise, and user position information beamforming approaches in an HRI setting. We raise the importance of replacing classical black-box integration technology applications with incorporation acoustic environment representation modeling, target source direction. Test data were recorded on real robot under various moving conditions. For addressing channel problem incorporating effect during training, clean samples passed through estimated static responses noise was added. Beamforming is investigated regarding oracle tracking using, instance, image processing. The proposed strategy interesting robotics community, because it allows development voice-based limited training without relying third-party technologies or Internet access eliminating need to upload cloud. In our mobile scenario, resulting engine provided average word error rate at least 19% 34% lower than publicly available APIs playback (i.e., loudspeaker) human testing modalities, respectively.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملAutomatic speech recognition for children
In this paper, the acoustic and linguistic characteristics of children speech are investigated in the context of automatic speech recognition. Acoustic variability is identi ed as a major hurdle in building high performance ASR applications for children. A simple speaker normalization algorithm combining frequency warping and spectral shaping introduced in [5] is shown to reduce acoustic variab...
متن کاملRevisiting scenarios and methods for variable frame rate analysis in automatic speech recognition
In this paper we present a revision and evaluation of some of the main methods used in variable frame rate (VFR) analysis, applied to speech recognition systems. The work found in the literature in this area usually deals with restricted conditions and scenarios and we have revisited the main algorithmic alternatives and evaluated them under the same experimental framework, so that we have been...
متن کاملRobust speech recognition in client-server scenarios
This paper addresses issues that are specific to the implementation of automatic speech recognition (ASR) applications and services in client-server scenarios. It is assumed in all of these scenarios that functionality in a human-machine dialog system is distributed between mobile client devices and network based multi-user media and application servers. It is argued that, while there has alrea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM transactions on human-robot interaction
سال: 2021
ISSN: ['2573-9522']
DOI: https://doi.org/10.1145/3442629